Hopper Hierarchical Flow Improves Turnaround in Physical Design of Large IC
By Paul Rodman
Integrated System Design
Posted 07/11/01, 11:49:33 AM EDT
Today 's problems in chip design are related to flow, not tools.Building an in-house flow -- the successful interplay of tools, data and people -- has become increasingly difficult because there aren't enough skilled people even as physical designs (such as SoC) keeping growing more complex. And if that isn't enough, there are deep-submicron semiconductor processes to consider, as well as the profusion of tools that have become mandatory. It's clear that engineers need more than just a bag of tools; they need a starting point built upon the collective flow and software expertise from the best-in-class design community.
Take the flow we used to design a high-performance graphics
chip for 3dfx Interactive. It 's based on proprietary physical-design automation software called Hopper. Hopper makes practical a new automated physical-design flow that offers:
- Abutted hierarchical design,
- Concurrent design,
- Automation of all tasks and easier “what if?” experimentation.
Hopper is built on top of commercially available tools such as Avanti 's Apollo, Hercules and StarRC-XT, as well as proprietary tools that perform such tasks as repeater insertion and clock distribution. Hopper is essentially an automation engine that enables ReShape to capture physical design flow know-how,with tool-specific knowledge (the best default settings) and design-specific knowledge (tool parameters and event ordering for a specific chip; see Fig.1,page 28).
Our challenge was to make Hopper meet the following characteristics of the 3dfx graphics chip:
- fabbed using Taiwan Semiconductor Manufacturing Co.'s 0.18-micron process
- six-layer metal
- 1.5 million placeable objects
- 30 million transistors
- 200 RAMs,four PLLs, three D/A converters, two AGPs
- 18 blocks (12 core, four pad-ring blocks)
- 18 different clocks,up to 533 MHz (typically 200 to 350 MHz)
- largest block, 250,000 placeable objects
- over 10,000 repeaters added.
Designs like these are growing so quickly that they are outpacing EDA tool capacity, which makes hierarchical physical design a necessity. The smaller netlists that result from a hierarchical approach translate to shorter run-times, higher tool reliability (fewer core dumps), improved quality of results and greater determinism from run to run.
But more important, in a hierarchical approach the
design team gains the benefits of block-level parallelism,which makes it possible to employ people and
tools more effectively. Block-level design enables different designers to work in parallel more effectively on the
same chip.
A hierarchical flow also supports more deterministic
results from run to run. By definition,dividing the chip
into blocks limits the potential dispersion of cells and
therefore reduces the potential for radical timing
changes or congestion changes. However,in a traditional flat flow there is no guarantee that those cells will
be located within the same appropriate proximity; therefore, every run with minor changes may cause new problems to surface.
One disadvantage of hierarchical design is that many
optimizations don 't occur,because the blocks are separated and the changes that must be made are less apparent to the engineer. This is the “horizon effect” and it can result in poor-quality results. A number of tasks suffer from the horizon effect; they are:
- pin assignment
- circuit rules ((e.g.,max transition)
- timing problems
- verification problems ((e.g.,antenna rules)
- clock distribution
- power distribution.
ReShape's flow performs hierarchical physical design
without these problems. Because our flow uses previous
outputs as the input of the next run,along with the lat-
est changes, we are able to see how the blocks fitted to-
gether last time and leverage that history to refine our
layout. The traditional flow tries to produce the optimal
layout from one run;ours allows a series of runs that
produce more optimized layouts each time. In a sense,
the tools become smarter with each run and are able to
use the previous layouts to avoid the horizon effect.
A traditional hierarchical flow relies on channels --
open lanes of empty space left between all the blocks --
to provide a pathway for connections for last-minute
design fixes. Channels are undesirable for three reasons:
- They create potential coupling problems because
they implicitly bundle wires. This increases the risk the
chip will not run at speed, and it may not run at all.
- Top-level nets travel longer distances since they
must go around, rather than through,the blocks to reach
their destination. These longer distances can negatively
affect timing.
- They waste chip area.
ReShape 's flow solves the channel problem by removing channels and using abutted blocks (Fig.2,page 30). This optimizes block interconnections because signals cut through the blocks, utilizing the extra metal re-
sources within those blocks. Consequently, there are no
spaces between blocks.This allows for a more compact
physical design and shorter wires, resulting in shorter
paths, greater reliability and faster operation.
Hierarchical design enables concurrent physical and
logic design. In a traditional flow, the back-end design
team must wait until the front-end RTL design team has
finished, resulting in both schedule and quality problems. For example,several logical blocks of a chip may
already be complete while completion of others may be
months away. In a typical flat methodology,the physical-design team would not have access to a complete netlist
and therefore experimentation -- with meaningful results -- would not be possible.
Concurrent design
Without back-end feedback,the functional-design team
can unknowingly create problems that are too difficult or
costly to solve once the chip goes to physical design. But concurrent functional and physical design permits
the physical-design team to start its process as soon as
the main structure of some of the netlists is determined,
which can be several months (or even a year) before the
completion of the entire front-end design.
Starting physical design early enables the front-end
team to refine the RTL to address problems generated
by the physical-design process. Front-end designers
make decisions that affect the physical design; therefore, the ideal methodology would provide them with
information about the physical design on which to base
those decisions. Early feedback about designs can produce a chip of much higher
quality. In fact,with deep-submicron designs needing several
repeater insertion delays just to
cross the chip, early experimentation with the floor plan be-
comes essential.
Concurrent design makes
sense because the main structures of a block-level netlist
generally emerge early in the
design operation. The remainder of functional-design time is
usually spent implementing the
control logic, verifying the design and making minor bug
fixes, but these changes do not usually have a major im-
pact on the behavior of the netlist in the back end. With
this in mind, why not give the physical-design team access to the parts of the logic that are complete and derive the benefits from concurrent design (Fig.3,page 32)?
In our case, concurrent front-end and back-end design enabled the RTL design team to make changes at a
more convenient place in the design process and solve
back-end problems more efficiently. We found this to be
an important advantage.
For example,in the largest block of the design,we
were plagued by congestion or hot spots that prevented
a clean route. Inspection revealed that the portion of the
netlist hierarchy in that area contained an enormous
number of high-fanout nets acting as selects to AOI
gates actually functioning as 2:1 multiplexers. Our Synopsys tools had chosen the AOI gate because it looked
slightly better on paper.
Two fixes were implemented to solve this problem.
We changed the synthesis script to use the infer-mux directive,which reduced the number of high-fanout nets
by a factor of two. And we added a pass of buffer tree
optimization to the flow for this block
By discovering these kinds of back-end obstacles
early, we caught the RTL design team while it could still
make changes relatively painlessly.
In deep-submicron designs, the difference between
wire model and reality is huge,making them difficult or
impossible to use for timing convergence. Some design
teams will be conservative and build in margin to cover
the difference. Unfortunately, with today 's processes
this approach often isn 't feasible.
The ReShape flow creates per-block wire-load models,which are used for synthesis only. The logic engineer
ignores the synthesis timing reports (except for simple A
vs. B netlist comparisons), instead converging on the
post-placement timing we provide from the flow.(This
placement-based timing has been correlated to at least
one previous full-route/full-extraction run.)
The wire model is used only to create a netlist with
the appropriate level of stiffness for optimal back-end
timing convergence.We found that once it is set properly engineers could inject new netlists into the flow
and evaluate RTL or synthesis changes with real data in
a few hours.
After a run, engineers want to know if the chip s con-
verging on timing and if it has any routing-congestion
problems. With the ReShape flow, we did the whole
loop in just a few hours for most blocks. For example, a
block with 100,000 placeable instances took about 10
hours to completely converge
on timing and routing. This
same process could take days
in a traditional methodology. At
some point the RTL converges
on a final set of netlists and the
push to tapeout is on.
Due to the automation and
the hierarchical design process,
we were able to build the entire
3dfx chip from scratch in 24
hours. Starting from gate-level
netlists (netlists alone were
over 1 Gbit) and with the previ-
ous floor plan and flow configu-
ration checked out from the
source tree, we spawned more than 4,000 individual
jobs, with all blocks placed and routed and timing con-
verged. The ReShape tools and the Avanti runs created
more than 10,000 files.
If network or hardware problems caused a crash, the
flow could be automatically restarted and would resume
execution where it left off.
Automatic steps
One of the most important advantages of the ReShape
flow is the automation of thousands of manual steps
normally required for a hierarchical
physical design. The flow provides a
framework for adding special-value automation in incremental stages to
solve the many problems that come up
while building a block. We have identified time-consuming tasks that could
be automated and we have developed
code that 's incorporated into the flow
to handle those tasks. Automation not
only saves a significant amount of design time, but it is also based on previous chip successes and on proven
configurations,enabling a design team
to fine-tune these settings over time to
ensure the highest tool performance.
Inherited knowledge
For example,we divide the placement
process into several discrete steps. The
first is preparing the command file. Our
flow opens our database and studies
the block, using controls that have
been defined according to user preferences, then automates the production
of a command file that encapsulates all
our learning about the best way to perform the placement tasks for that
block. Any person who uses the flow inherits the benefits of any knowledge
from the flow builders -- and perhaps
that block 's previous builder.
Another example is the automation
of log file review. The log file is a critical
line of communication from vendor
tools to inform the user about the results of each task execution. If you do
not go through every line in the log file,
you may miss a single-line message,
among tens of thousands of lines,that
indicates a problem. The sad implications of this message may not be obvi-
ous for days or weeks. The ReShape
flow has embedded log-checking software that automatically opens and
reads the log, looking for indications of
errors and highlighting them, and stopping the process.
No matter how good design tools
are, new physical-design challenges
usually emerge on new projects or with
a new library or process. By nature,
EDA vendors can address the needs of
only some of them.But the ReShape
flow creates special-purpose code to
deal with special needs, then config-
ures and “clicks it in” the flow. The flow
acts as powerful framework for adding
such tools.
For example,in the 3dfx chip, there were several AGP
and SDRAM buses with very tight clock-skew specs. On
previous chips built by 3dfx, skew was handled with
manual editing. However,if the pad ring needed to be
changed -- e.g., the core size changed or the pads moved
around -- the hand layout could not be used. So the
manual layout would need to be redone completely to fit
the new design.
To solve this problem,ReShape wrote a “point tool”
to handle the AGP bus layout and the code was replayable when changes were made to the floor plan. So we could change the physical design of the chip, make it larger or smaller with the push of a button, and all the
previously configured data would replay and build the
balance buses that we needed for the AGP spec each
time. This gave us the flexibility to experiment by changing the chip size without having to be concerned about
the intricate layout of the balanced buses.
In fact,the entire construction of the pad ring is often
one of the most manual processes in building a chip. But
we have developed a library of configurable point tools
that we can use to create a replayable flow for all the
steps in assembling the pad ring.
The ultimate test in a flow is how quickly are you able
to implement last-minute changes. One of the most important advantages of the hierarchical design flow is the
ability to respin a block from a new netlist without affecting the rest of the chip. In a flat design methodology,
any change to any part of the chip could potentially affect the entire chip, requiring a tremendous redesign effort that could incapacitate the chip.
The hierarchical approach provides a more deterministic path to the final physical design,allowing us to deal
with last-minute netlist changes. For example, we
changed 30,000 gates of the design just three weeks before tapeout. This was a critical fix, adding an important
new feature to comply with the latest graphics standards, and was essential in terms of marketing the chip.
Yet as important as this fix was, it only affected three
blocks in the design out of a total of 22 blocks, so the
engineers isolated the affected blocks and resynthesized only those three blocks without having to touch
the rest of the chip -- all the other blocks were still considered on the shelf. A flat flow would have required rebuilding the whole chip -- and a potentially large delay.
Shrinks and variants
Even as the ReShape flow has significant short-term advantages for a chip being designed today, it has advantages for future chips. Because the flow collects and
incorporates knowledge about the process and the chip
during the design process, it leverages that knowledge. In this way, future chips on the same process or similar
chips on a different process can stand on the shoulders
of the previous work.
The proof of the value in using a flow came only a
month after tapeout, when 3dfx used the flow to tape
out another chip in the same process. There were ap-
proximately 700,000 placeable objects, yet three more
chips were in design with a new 0.15-micron flow at the
time of 3dfx 's demise.
Respins of a chip in a new process are aided by the
fact that all the floor-planning information is represented
so that it relates to previous work. In addition, all dimensions are specified as much as possible with
process-scalable parameters such as units of wire
pitches. This makes it easier to get a resynthesis of the
chip up and running in a new flow with the same floor
plan (only smaller,of course). We have switched a chip
from 0.25 to 0.18 micron in just two days.
New chips often have blocks that are recycled from
previous designs. In our flow, the floor-planning code for
these blocks is also highly recyclable.
Managing data
A traditional hierarchical design flow also presents a
data-management challenge. The number of scripts,
command files and databases required increases by N-fold with an N-block hierarchical flow. Even though each
block is very manageable in size and complexity, the
number of jobs creates significantly more work for the
design team. If any changes to the floor plan are re
quired,such as block size changes or movement, all of
the block-level scripts and command files require regeneration in the traditional hierarchical flow.
The ReShape flow centralizes,organizes and automates the generation of all these block-specific objects.
Thousands of tool settings must be set to intelligent defaults; however, we have to be able to change any of
them at any point in the flow. To solve this problem, we
use a hierarchy of configuration files to control settings
based on the process used, the particular chip being
built and the particular block within the chip. Instead of
hundreds of scripts with knob settings dispersed
throughout, we have a small number of centralized files
that can be easily kept under revision control.
We think of this bundle of context-dependent,variable settings as a “tech object.” When we need to bring
up a new process, we can debug the settings for this
tech object and then export it and do a chip respin in a
new process by reinstantiating the flow with it.
The flow also allows users to share data while the
block is being developed. For example,typically one person is in charge of the floor plan for the chip and the top-level power and clock distribution. This person then
exports the per-block context to all the block owners on
the chip.They are each able to build a complete copy of
the chip from this imported context, although they typically work only on the blocks they are responsible for. In
the end, the chip could be taped out from any of the
block views since they all have the same pin abutments, power hookups and other global objects.
In our case,the results speak for themselves. When
the 3dfx Interactive chip came back from Taiwan Semiconductor it was placed in the board and it worked at full
speed -- a testament to the design team's rigorous functional and timing-verification methodology, Hopper and
the ReShape physical-design flow (Fig.4).